Add pointwise indexing via isel_points method #481

jhamman · 2015-07-20T05:41:36Z

This provides behavior equivalent to numpy slicing with multiple lists.

Example

>>> da = xray.DataArray(np.arange(56).reshape((7, 8)), dims=['x', 'y'])
>>> da
<xray.DataArray (x: 7, y: 8)>
array([[ 0,  1,  2,  3,  4,  5,  6,  7],
       [ 8,  9, 10, 11, 12, 13, 14, 15],
       [16, 17, 18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29, 30, 31],
       [32, 33, 34, 35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44, 45, 46, 47],
       [48, 49, 50, 51, 52, 53, 54, 55]])
Coordinates:
  * x        (x) int64 0 1 2 3 4 5 6
  * y        (y) int64 0 1 2 3 4 5 6 7
>>> da.isel_points(x=[0, 1, 6], y=[0, 1, 0])
<xray.DataArray (points: 3)>
array([ 0,  9, 48])
Coordinates:
    y        (points) int64 0 1 0
    x        (points) int64 0 1 6
  * points   (points) int64 0 1 2

related: #475

This provides behavior equivalent to numpy slicing with multiple lists. Example ------- >>> da = xray.DataArray(np.arange(56).reshape((7, 8)), dims=['x', 'y']) >>> da <xray.DataArray (x: 7, y: 8)> array([[ 0, 1, 2, 3, 4, 5, 6, 7], [ 8, 9, 10, 11, 12, 13, 14, 15], [16, 17, 18, 19, 20, 21, 22, 23], [24, 25, 26, 27, 28, 29, 30, 31], [32, 33, 34, 35, 36, 37, 38, 39], [40, 41, 42, 43, 44, 45, 46, 47], [48, 49, 50, 51, 52, 53, 54, 55]]) Coordinates: * x (x) int64 0 1 2 3 4 5 6 * y (y) int64 0 1 2 3 4 5 6 7 >>> da.isel_points(x=[0, 1, 6], y=[0, 1, 0]) <xray.DataArray (points: 3)> array([ 0, 9, 48]) Coordinates: y (points) int64 0 1 0 x (points) int64 0 1 6 * points (points) int64 0 1 2 related: #475

shoyer · 2015-07-20T06:06:58Z

xray/core/dataset.py

+
+        # all the indexers should be iterables
+        keys = indexers.keys()
+        indexers = [(k, ([v] if not isinstance(v, Sequence) else v))


Probably better to raise an error if someone tries to pass in something that isn't a sequence? I would probably coerce with np.asarray and then raise if v.dtype.kind != 'i' or v.ndim != 1.

My thought was that you may not know the shape of x and y a priori but am happy to remove this. After thinking about how numpy treats scalar vs array indices (example below), I think it is best to remove this and raise an error.

In [1]: x = np.arange(12).reshape((3, 4)) In [2]: x[2, 3] Out[2]: 11 In [3]: x[[2], [3]] Out[3]: array([11])

I've been trying to implement this and I'm not getting it to work. Specifically, how do we want to handle slice objects? As I think about it more, I'm not sure how my first commit was working with slice objects as indexers.

Any thoughts on the best way to handle this?

I would raise an error with slice objects, too. 1d arrays of integers is enough functionality.

On Wed, Jul 22, 2015 at 9:46 PM, Joe Hamman notifications@github.com
wrote:

an array indexer, in which case the data will be a copy.

See Also

Dataset.sel

DataArray.isel

DataArray.sel

DataArray.isel_points

"""

invalid = [k for k in indexers if k not in self.dims]

if invalid:

raise ValueError("dimensions %r do not exist" % invalid)

# all the indexers should be iterables

keys = indexers.keys()

indexers = [(k, ([v] if not isinstance(v, Sequence) else v))

I've been trying to implement this and I'm not getting it to work. Specifically, how do we want to handle slice objects? As I think about it more, I'm not sure how my first commit was working with slice objects as indexers.
Any thoughts on the best way to handle this?
Reply to this email directly or view it on GitHub:
https://github.com/xray/xray/pull/481/files#r35289799

Okay. That is done in my last commit so no further change needed here.

shoyer · 2015-07-20T06:32:08Z

xray/test/test_dataarray.py

+
+        actual = da.isel_points(y=y, x=x, dim='test_coord')
+        assert 'test_coord' in actual.coords
+        assert actual.coords['test_coord'].shape == (len(y), )


I would also verify that x and y are still coordinates, along the test_coord dimension.

Probably easier just to construct the expected data-array and then compare them with self.assertDataArrayIdentical.

shoyer · 2015-07-20T06:36:53Z

Very nice! Really looking forward to this one 👍.

shoyer · 2015-07-21T00:10:30Z

xray/core/dataset.py

+        return concat([self.isel(**d) for d in
+                       [dict(zip(keys, inds)) for inds in
+                        zip(*[v for k, v in indexers])]],
+                      dim=dim)


Let's explicitly provide the coords and data_vars arguments to concat:

indexer_dims = set(indexers) def relevant_keys(mapping): return [k for k, v in mapping.items() if any(d in indexer_dims for d in v.dims)] data_vars = relevant_keys(self.data_vars) coords = relevant_keys(self.coords)

This means concat doesn't need to look at any data to figure out which variables should be concatenated (vs. variables which were not indexed).

The test case where would be a dataset that aren't has a constant variable, e.g.,

ds = xray.Dataset({'x': range(5), 'y': 0})

If you do ds.isel_points(x=[0, 1, 2]), y should still be a scalar.

test added in test_dataset.py

shoyer · 2015-07-23T07:16:43Z

xray/core/dataset.py

+        ----------
+        dim : str or DataArray or pandas.Index, optinal
+            Name of the dimension to concatenate along. This can either be a
+            new dimension name, in which case it is added along axis=0, or an


Existing dimension names are not valid choices for sel_points, I think. Actually, that's probably an edge case worth writing a test for.

You still need to update the docstring here.

…, and a few others

shoyer · 2015-07-23T18:41:33Z

xray/core/dataset.py

+            raise ValueError('All indexers must be the same length')
+
+        # Existing dimensions are not valid choices for the dim argument
+        if dim in self.dims:


This will fail if you pass in an array for the dimension argument -- which also still needs a test.

concat might actually already do reasonable error handling here. If not, you'll need to make sure of _calc_concat_dim_coord: https://github.com/xray/xray/blob/cb51b359d213e711208f2747ffc5ab4acb25dc4d/xray/core/combine.py#L118-L137

Indeed. Happy to write that test. However, I'm actually having trouble picturing the use case here. Do you have an example?

I have an xray Dataset ds_stations with station data with dimensions (station, time). Now I pointwise index into a 2D grid with something like: ds_grid.sel_points(latitude=ds_stations.latitude, longitude=ds_stations.longitude, dim=ds_stations.station, method='nearest').

Now the station dimension in the result is labeled by station IDs, not sequential integers.

on the other hand, I might not even want to need to the dim argument in this case, given that the metadata is already contained in the ds_stations.latitude and ds_stations.longitude DataArray objects.

…ys or lists

jhamman · 2015-07-26T16:57:23Z

@shoyer - This last commit should handle the DataArrary and list-like objects for the dim argument. Take a look and let me know what you think. It took me a while to wrap my head around the issue, but in the end, I think the solution was pretty straight forward.

shoyer · 2015-07-27T03:04:34Z

doc/indexing.rst

@@ -134,6 +135,21 @@ __ http://legacy.python.org/dev/peps/pep-0472/
        # this is safe
        arr[dict(space=0)] = 0

+Pointwise indexing
+--------------------------------


you need to have the same number of dashes as the length of the line above.

shoyer · 2015-07-27T03:06:52Z

xray/core/dataset.py

+
+        Parameters
+        ----------
+        dim : str or DataArray or pandas.Index or other list-like object, optinal


spelling: optinal -> optional

shoyer · 2015-07-27T03:11:03Z

This needs a couple of small fixes to the docs but otherwise looks ready to merge!

Something to consider: maybe indexing like ds.isel_points(x=stations.x, y=stations.y) should automatically determine the dim argument from the stations.x DataArray?

We could certainly add that later, and it makes more sense once we have sel_points anyways.

jhamman · 2015-07-27T04:56:43Z

Okay, thanks. I made some minor revisions to the docs and docstrings.

I think we should wait on the automatic discovery of the dimension name. I think adding sel_points will add a few more use cases for isel_points and may make it clearer how to determine the dim name.

Add pointwise indexing via isel_points method

shoyer · 2015-07-27T05:05:10Z

woohoo! thanks again for putting this together :)

shoyer reviewed Jul 20, 2015
View reviewed changes

shoyer added API design topic-indexing labels Jul 20, 2015

shoyer added this to the 0.6 milestone Jul 20, 2015

shoyer reviewed Jul 20, 2015
View reviewed changes

shoyer reviewed Jul 21, 2015
View reviewed changes

Joe Hamman added 2 commits July 22, 2015 22:09

update pointwise indexing docs

ae1f3a8

update isel_method api, does not support slice indexers

c3cb3a5

shoyer reviewed Jul 23, 2015
View reviewed changes

add test cases for isel_points: negative indicies, existing dim names…

bf0915b

…, and a few others

shoyer reviewed Jul 23, 2015
View reviewed changes

add isel_points support for non string dim arguments such as dataarra…

e4851ba

…ys or lists

fix typo in test_dataarray.py

10b67e3

shoyer reviewed Jul 27, 2015
View reviewed changes

doc updates for isel_points

5ab9d4b

shoyer added a commit that referenced this pull request Jul 27, 2015

Merge pull request #481 from jhamman/feature/isel_points

69f7386

Add pointwise indexing via isel_points method

shoyer merged commit 69f7386 into pydata:master Jul 27, 2015

jhamman deleted the feature/isel_points branch July 27, 2015 05:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add pointwise indexing via isel_points method #481

Add pointwise indexing via isel_points method #481

jhamman commented Jul 20, 2015

shoyer Jul 20, 2015

jhamman Jul 20, 2015

jhamman Jul 23, 2015

shoyer Jul 23, 2015

I've been trying to implement this and I'm not getting it to work. Specifically, how do we want to handle `slice` objects? As I think about it more, I'm not sure how my first commit was working with `slice` objects as indexers.
Any thoughts on the best way to handle this?

jhamman Jul 23, 2015

shoyer Jul 20, 2015

jhamman Jul 23, 2015

shoyer commented Jul 20, 2015

shoyer Jul 21, 2015

jhamman Jul 23, 2015

shoyer Jul 23, 2015

jhamman Jul 23, 2015

shoyer Jul 23, 2015

shoyer Jul 23, 2015

jhamman Jul 23, 2015

shoyer Jul 23, 2015

shoyer Jul 23, 2015

jhamman commented Jul 26, 2015

shoyer Jul 27, 2015

shoyer Jul 27, 2015

shoyer commented Jul 27, 2015

jhamman commented Jul 27, 2015

shoyer commented Jul 27, 2015

Add pointwise indexing via isel_points method #481

Add pointwise indexing via isel_points method #481

Conversation

jhamman commented Jul 20, 2015

Example

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

I've been trying to implement this and I'm not getting it to work. Specifically, how do we want to handle slice objects? As I think about it more, I'm not sure how my first commit was working with slice objects as indexers. Any thoughts on the best way to handle this?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shoyer commented Jul 20, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jhamman commented Jul 26, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shoyer commented Jul 27, 2015

jhamman commented Jul 27, 2015

shoyer commented Jul 27, 2015

I've been trying to implement this and I'm not getting it to work. Specifically, how do we want to handle `slice` objects? As I think about it more, I'm not sure how my first commit was working with `slice` objects as indexers.
Any thoughts on the best way to handle this?